A comparison of bootstrap methods and an adjusted bootstrap approach for estimating prediction error in microarray classification Short title: Bootstrap Prediction Error Estimation
نویسندگان
چکیده
SUMMARY This paper first provides a critical review on some existing methods for estimating prediction error in classifying microarray data where the number of genes greatly exceeds the number of specimen. Special attention is given to the bootstrap-related methods. When the sample size n is small, we find that all the reviewed methods suffer from either substantial bias or variability. We introduce a repeated leave-one-out bootstrap method which predicts for each specimen in the sample using bootstrap learning sets of size ln. We then propose an adjusted bootstrap method that fits a learning curve to the repeated leave-one-out bootstrap estimates calculated with different bootstrap learning set sizes. The adjusted bootstrap method is robust across the situations we investigate and provides slightly conservative estimate for the prediction error. Even with small samples, it does not suffer from large upward bias as the leave-one-out bootstrap and the .632+ bootstrap, and it does not suffer from large variability as the leave-one-out cross-validation in microarray applications.
منابع مشابه
Semiparametric Bootstrap Prediction Intervals in time Series
One of the main goals of studying the time series is estimation of prediction interval based on an observed sample path of the process. In recent years, different semiparametric bootstrap methods have been proposed to find the prediction intervals without any assumption of error distribution. In semiparametric bootstrap methods, a linear process is approximated by an autoregressive process. The...
متن کاملIdeal bootstrap estimation of expected prediction error for k-nearest neighbor classifiers: Applications for classification and error assessment
Euclidean distance -nearest neighbor ( -NN) classifiers are simple nonparametric classification rules. 5 5 Bootstrap methods, widely used for estimating the expected prediction error of classification rules, are motivated by the objective of calculating the ideal bootstrap estimate of expected prediction error. In practice, bootstrap methods use Monte Carlo resampling to estimate the ideal boot...
متن کاملStatistical Topology Using the Nonparametric Density Estimation and Bootstrap Algorithm
This paper presents approximate confidence intervals for each function of parameters in a Banach space based on a bootstrap algorithm. We apply kernel density approach to estimate the persistence landscape. In addition, we evaluate the quality distribution function estimator of random variables using integrated mean square error (IMSE). The results of simulation studies show a significant impro...
متن کاملNon-Bayesian Estimation and Prediction under Weibull Interval Censored Data
In this paper, a one-sample point predictor of the random variable X is studied. X is the occurrence of an event in any successive visits $L_i$ and $R_i$ :i=1,2…,n (interval censoring). Our proposed method is based on finding the expected value of the conditional distribution of X given $L_i$ and $R_i$ (i=1,2…,n). To make the desired prediction, our approach is on the basis of approximating the...
متن کاملEstimation in Simple Step-Stress Model for the Marshall-Olkin Generalized Exponential Distribution under Type-I Censoring
This paper considers the simple step-stress model from the Marshall-Olkin generalized exponential distribution when there is time constraint on the duration of the experiment. The maximum likelihood equations for estimating the parameters assuming a cumulative exposure model with lifetimes as the distributed Marshall Olkin generalized exponential are derived. The likelihood equations do not lea...
متن کامل